rGAI: An R package for fitting the generalized abundance index to seasonal count data

Abstract The generalized abundance index (GAI) provides a useful tool for estimating relative population sizes and trends of seasonal invertebrates from species' count data and offers potential for inferring which external factors may influence phenology and demography through parametric descriptions of seasonal variation. We provide an R package that extends previous software with the ability to include covariates when fitting parametric GAI models, where seasonal variation is described by either a mixture of Normal distributions or a stopover model which provides estimates of life span. The package also generalizes the models to allow any number of broods/generations in the target population within a defined season. The option to perform bootstrapping, either parametrically or nonparametrically, is also provided. The new package allows models to be far more flexible when describing seasonal variation, which may be dependent on site‐specific environmental factors or consist of many broods/generations which may overlap, as demonstrated by two case studies. Our open‐source software, available at https://github.com/calliste‐fagard‐jenkin/rGAI, makes these extensions widely and freely available, allowing the complexity of GAI models used by ecologists and applied statisticians to increase accordingly.

Here, we outline the GAI approach; full details are provided in Dennis et al. (2016). Within a single year, suppose that counts are recorded at S sites each visited on at most T occasions. Each count y i,j for the ith site and jth visit is regarded as a realization of a discrete random variable, for example Poisson (alternative distributions are described later), with expectation where N i represents relative total abundance for site i, and a i,j denotes a function describing seasonal variation in counts in terms of a small set of parameters. Estimates of abundance are relative since not all individuals present during a visit will be observed, and detection is assumed to be constant (but see Matechou et al., 2014). Variation in transect length is also not accounted for, but could be by appropriately scaling N i . The GAI encompasses three options for a i,j which describe seasonal variation in counts: • Splines-seasonal variation is described by a flexible curve representing a i,j , for example using B-splines (Dennis et al., 2016). The a i,j are scaled to sum to unity, such that they describe how N i is spread over the season. The seasonal curve is typically assumed to be the same across all S sites and the smoothness of the curve is defined by the number of knots for the spline. This option corresponds closely with the method previously developed for modeling butterfly count data (Dennis et al., 2013) and is the approach typically used for abundance trend reporting-see for example Brereton et al. (2020); Fox et al. (2021).
• Mixture model-seasonal variation is taken as a mixture of B Normal probability density functions (corresponding to one or more broods within a year) so that where t i,j denotes the jth occasion, which is the time during the season typically measured by day or week, and w i,b , μ i,b and σ i,b correspond to the weight, mean, and standard deviation, respectively, for the ith site and bth brood, and ∑ B b=1 w i,b = 1, B ≥ 1. The weights w i,b describe the relative sizes of the B broods with respect to each other. As in the spline case, the mixture model is a phenomenological model, where the a i,j effectively describe how N i is spread over time, where a i,j integrate to unity.
• Stopover model-this is based upon the model proposed in Matechou et al. (2014) and weighting w i,b . ϕ k,c is the probability an individual present for c occasions and present at visit k, will remain until visit k + 1. Since ϕ k,c represents apparent survival probability from one visit to the next, adult life spans may be estimated by 1∕(1−ϕ k,c ). Unlike the spline and mixture models, the stopover model proposes a mechanism, of which the N i are a part of. Hence, the model results in complex bounds on the a i,j , where it is the emergence parameters β i,d-1 which sum to unity. See Matechou et al. (2014) for full details of this model.
Where counts, y i,j , are assumed to be Poisson, efficient model fitting of the GAI is achieved by maximizing a concentrated (or profile) likelihood with respect to only the parameters associated with {a i,j } and estimating each N i by suitably scaled site totals. An iterative approach is taken when assuming negative binomial and zero-inflated Poisson distributions, as explained in Dennis et al. (2016).
To date, the GAI has primarily been adopted as a method for producing species' abundance trends-for example, it is used annually for reporting of UK butterfly trends , which contribute to UK biodiversity indicators (Department for Environment, Food and Rural Affairs, UK, 2020). The approach has also been used in status assessments for larger moths in Britain (Fox et al., 2021;Randle et al., 2019), and in analyses of European butterfly populations (Van Swaay et al., 2020). Where the specific goal is to produce abundance trends, the spline option for describing seasonal variation is used. Here, flight periods are typically assumed to be fixed over sites (as originally developed using generalized additive models, Dennis et al., 2013), or geographical subsets of them (Schmucki et al., 2016).
However, the GAI presents wider opportunities for further insights into seasonal count data, particularly through the application of the parametric descriptions of seasonal variation from the mixture and stopover models. The rGAI package therefore extends previous software to provide accessible code that can accommodate the inclusion of relevant covariates, as well as any number of broods within a defined season. The package also provides the opportunity for wider exploration of stopover models, including estimating species' life spans.

| r G AI PACK AG E OVERVIE W
The rGAI package extends the software made available in the supplementary materials of Dennis et al. (2016). Model fitting is by maximumlikelihood, and model parameters are transformed using combinations of logarithmic (e.g., for μ and σ) or logistic (e.g., for w and ϕ) link functions. rGAI functions with simple inputs allow survey data to first be verified (for duplicate or missing entries across time, or across sites).
Then, initial values can be selected for the model fitting process, accounting for the appropriate link scale. The GAI with the mixture or stopover model description for seasonal variation can be fitted with covariate inclusion for parameters of interest, and measures of uncertainty on parameter estimates can be produced using bootstrapping methods.  (Pollard & Yates, 1993).

| Incorporating covariates
We illustrate the usage of covariates in the rGAI package, to allow for seasonal variation in counts to vary over space, by application Here, the rGAI package was used to fit a GAI with Poisson distribution and a stopover model to describe seasonal variation, where mean emergence dates were regressed linearly on northing for both broods, and the weighting parameter was described as a quadratic function of northing. Parameter estimates are given in Table 2, with constant survival probability, ϕ, and constant standard deviations for each brood (σ 1 and σ 2 ). The transform_output function was used to produce parameter estimates on the parameter scale for specified covariate values (northing), which are shown in Figure 1, as well as estimates of seasonal pattern, which are presented in Figure 2. Figures 1 and 2

| Modeling multiple broods
The mixture and stopover model formulations of the GAI are de- was not an exhaustive model comparison.

TA B L E 2
Parameter estimates (with standard errors, SE) for the GAI, fitted with Poisson distribution and stopover model, applied to UK count data for the Common Blue butterfly in 2018, where μ 1 is the mean emergence for the first brood, μ d is the difference between mean emergence times μ 1 and μ 2 , and w 1 is the weighting of the size of the first brood with respect to the second brood, such that w 1 + w 2 = 1. Note: As they vary with northing, estimates for μ 1 , μ d and w 1 are shown on the link scale (log link for μ and logistic link for w 1 ). See estimates on the parameter scale in Figure 1. Estimates for the standard deviation of the emergence period for each brood, σ 1 and σ 2 , and weekly survival probability, ϕ, are constant, and therefore shown on the parameter scale.
F I G U R E 1 Parameter estimates of mean emergence times, μ 1 and μ 2 , and mixing probability, w 1 , from fitting the GAI with Poisson distribution and a stopover model to counts of the Common Blue butterfly in 2018, with varying northing. For μ 1 and μ 2 , week 1 corresponds to the start of April. 95% confidence intervals derived by parametric bootstrap are shown.
Based on AIC, the model correctly identifies the species as having three broods ( Transformed parameter estimates for the best-fitting model are given in Table 4, along with 95% confidence intervals produced using a parametric bootstrap. Using a bootstrap also allows for the production of a confidence interval for other quantities of interest, for example, the estimated seasonal pattern, a i,j , estimates of relative site abundance, N i , and predicted counts, avoiding the use of the delta method which is likely to be complex in these cases.
This example demonstrates the potential of the GAI for modeling species with more than two generations within a season, and wider application could again involve extension to analysis over multiple years, as well as the incorporation of relevant covariates to account for spatial variation as in the previous example.

| DISCUSS ION/FUTURE AVENUE S
The rGAI package has been designed to provide easy-to-use soft- Generalization to accommodate any number of broods/generations within a season provides the opportunity for application of the GAI to species which are known to exhibit more than two broods per year, as well as to species with a less predefined number of broods, which may vary over space and time when species show phenotypic plasticity in voltinism and phenology (Macgregor et al., 2019). The rGAI package provides opportunities to better test for and identify such variation. Although this is applicable for several multivoltine butterfly species in the UK, there is even greater potential/relevance F I G U R E 2 Estimated seasonal pattern for a sample of northing values (each 100 km, from 50 to 950 km) from fitting the GAI with Poisson distribution and a stopover model to counts of the Common Blue butterfly in 2018. The area under the curve is the same for each northing value. The estimate of the mixing probability, w 1 , which describes the size of the first brood relative to the second, is given for each northing value at 100 km intervals.
Week 1 corresponds to the start of April.

F I G U R E 3
Observed mean count per week (black circles), averaged over sites, with 5% and 95% quantiles of all observed counts shown as error bars, for the Small Copper butterfly in 2018. The predicted mean count per week, averaged over sites, is shown in blue, along with predicted 5% and 95% quantiles for comparison. Predicted values are estimates from the best-fitting model from Table 3, for which parameter estimates are given in Table 4. Week 1 corresponds to the start of April.
beyond the UK, for example in Europe where species may be multivoltine in warmer parts of their range. Models for multivoltine species may also have increasing relevance as climate warming may lead to increases in species' voltinism (Altermatt, 2010).
In future releases of the rGAI package, we intend to allow survival ϕ to vary with respect to spatial covariates, or within the season in terms of time or age (Matechou et al., 2014). There is also the potential to account for variation in detection probability, to reduce bias in estimates of relative abundance, using relevant covariates (Matechou et al., 2014). The package can also be extended for multi- year fits and trend estimation; see for example Dennis et al. (2016).
We also hope that researchers may contribute new developments to the package; for example accounting for skewness in patterns of seasonal variation/emergence would be of interest (Calabrese, 2012).
The GAI is also relevant for other taxa, for example birds on migration-see Matechou et al. (2013), beetles-see Dennis et al. (2021) who model the two-year life cycle of fuliginators, Iberodorcadion fuliginator-and bees-see Matechou et al. (2018) who use a dynamic stopover model to analyze citizen science data on bumblebees, from the BeeWalk scheme. We anticipate that the rGAI package will enhance and enable further research by ecologists and applied statisticians, which can improve our understanding of changes in species' populations and phenology.

ACK N OWLED G M ENTS
The UK Butterfly Monitoring Scheme is organized and funded by Butterfly Conservation, the UK Centre for Ecology & Hydrology, British Trust for Ornithology, and the Joint Nature Conservation Committee. The UKBMS is indebted to all volunteers who contribute data to the scheme. We are thankful to Alessandro Mari and Eleni Matechou for spotting a minor error in the package.
We thank the associate editor and two anonymous referees for their insightful comments that improved this manuscript. Byron Morgan was supported by a Leverhulme research fellowship.
Funding was provided by the University of Kent research impact grant 43621.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest. Note: 95% confidence intervals are provided based on a parametric bootstrap. The means and standard deviations of the flight period are denoted by μ b and σ b , for each brood b. w 1 and w 2 describe the weighting of the size of the first and second brood, where w 1 + w 2 + w 3 = 1. r is the dispersion parameter for the negative binomial distribution.